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We present a comparative study in which "pebble game" rigidity analysis 
is applied to multiple protein crystal structures, for each of six different protein 
families. We find that the mainchain rigidity of a protein structure at a given 
hydrogen-bond energy cutoff is quite sensitive to small structural variations, and 
conclude that the hydrogen bond constraints in rigidity analysis should be chosen 
so as to form and test specific hypotheses about the rigidity of a particular 
protein. Our comparative approach highlights two different characteristic patterns 
("sudden" or "gradual") for protein rigidity loss as constraints are removed, in 
line with recent results on the rigidity transitions of glassy networks. 
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1. Introduction 

It is a common goal in biophysics to represent the flexibility of a protein and study its 
large-scale motion without incurring the full computational cost of molecular dynamics 
simulations. One popular family of approaches is based on normal-mode analysis 
applied to a full or simplified representation of the protein structure [1-10], with the 
aim of representing large-scale conformation change in terms of a reduced set of low- 
frequency motions [11]. Another approach is to divide up the protein structure into 
relatively rigid sections or domains, connected together by flexible regions or "hinges" . 
This can be done using a variety of structure-based approaches [12-17]. 

In this paper we concern ourselves with the "pebble game" [18], an integer 
algorithm for rigidity analysis. By matching degrees of freedom against constraints, 
it can rapidly divide a network into rigid regions and floppy "hinges" with excess 
degrees of freedom. The program First implements this algorithm for protein crystal 
structures [19]. The rigid units in a protein structure may be as small as individual 
methyl groups or large enough to include entire protein domains containing multiple 
secondary-structure units. The division of a structure into rigid units is referred to as 
a Rigid Cluster Decomposition (RCD). 

Rigidity analysis has been used to study phenomena such as virus capsid assembly 
[20] and protein folding [21,22]. The coarse-graining provided by a RCD also forms 
the basis of simulation methods aiming to explore the large-amplitude flexible motion 
of proteins: the ROCK algorithm [23] and more recently the Froda geometric 
simulation algorithm [24], which has been applied in various studies of protein 
flexibility [25-29], and the rigidity-enhanced elastic network model [30]. 

The results of rigidity analysis on proteins depend upon the set of constraints 
that are included, with the user setting an energy "cutoff" which determines the 
set of hydrogen bonds to include in the analysis, (see section [2). However, previous 
studies using First have used widely differing, sometimes contradictory, cutoff values 
and methods of constraint selection — we give a brief review of the situation in 
[Appendix A[ This methodological issue not only makes it more difficult for scientists 
to adopt pebble-game rigidity analysis as a method, but also raises issues in the 
interpretation of results. There is at present no clear guidance on the "correct" choice 
of cutoff value; nor is it clear how comparable are the results of rigidity analysis using 
a given cutoff value on slightly different protein structures. 

Hence the primary motivation for our study is to fill this gap by explicitly 
comparing the results of rigidity analysis on groups of very similar crystal structures. 
We concentrate particularly on eukaryotic cytochrome C while also considering five 
other proteins (hemoglobin, myoglobin, a-lactalbumin, trypsin and HIV-1 protease). 
For each protein structure we observe the pattern of rigidity loss during the progressive 
removal of hydrogen bonds, or "rigidity dilution" [21,22]. We define mainchain rigidity 
as a measure of the rigidity of the protein backbone in order to describe the rigidity 
loss during dilution. On the basis of this study we comment on the selection of cutoff 
values and the interpretation of rigidity analyses. 

The second motivation for our study is to observe the pattern of rigidity loss 
during dilution. Previous studies on protein folding [21] have drawn comparisons 
between the folding transition of proteins and the rigidity transition of glassy networks. 
A recent study [31] found that the rigidity transition in glasses could display either 
first-order or second-order behaviour depending on the character of the constraint 
network. In the first case, a small change in the constraints causes a sudden transition 
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from an entirely floppy state to one in which the entire system becomes rigid. In the 
second, rigidity develops in a percolating rigid cluster which initially involves only a 
small proportion of the network and then gradually increases in size as more constraints 
are introduced. Our data on rigidity dilution shows that both types of transition are 
possible in proteins, with four of our proteins typically displaying "gradual" rigidity 
change and two (trypsin and HIV-1 protease) displaying "sudden" rigidity change. 

2. Materials and Methods 

2.1. Protein selection 

We have chosen sets of proteins from the protein data bank (PDB) [32] to obtain 
similar crystal structures for our comparison, as summarised in Table [TJ We sought 
particularly (i) examples of the same protein from different organisms, e.g. cytochrome 
C proteins from multiple different eukaryotic mitochondria, and (ii) protein structures 
obtained under different conditions of crystallisation, e.g. in complex with different 
ligands, proteins or substrates. In the present study we will only investigate non- 
membrane proteins becuase the default treatment of hydrogen bonds and hydrophobic 
tethers in First is based on the assumption that the protein exists in a polar solvent 
(cytoplasm) rather than being within a hydrophobic or amphiphilic environment as for 
membrane-bound proteins. Proteins in a membrane environment can still be handled 
but this requires hand-editing of the constraint network. Rigidity analysis is best 
carried out on crystal structures with high resolution, so that we can have confldence in 
the accuracy of the atomic positions when constructing the hydrogen-bond geometries. 
We therefore concentrated on X-ray crystal structures with resolutions of better than 
2.5A. 

From each PDB crystal structure we extracted a single protein chain, eliminating 
all crystal water molecules, but retaining important hetero groups such as the 
porphyrin/heme units of cytochrome C and hemoglobin. The PyMOL visualisation 
software [33] proved very useful for this purpose. We add the hydrogens that are absent 
from X-ray crystal structures, using the Reduce software [34] which also performs 
necessary flipping of side chains. After the addition of hydrogens we renumbered the 
atoms using PyMOL again to produce flies usable as input to First [19,21]. In the 
case of HIV protease we analysed the homodimer unit, as in [19]. 

2.2. Rigidity analysis and dilution 

The energy of each potential hydrogen bond in the processed structure is calculated 
in First using the Mayo potential [35] ; the distance-dependent part of this potential 
is shown in Figure [H For the dilution. First performs an initial rigidity analysis 
including all bonds with energies of kcal/mol or lower; bonds are then removed in 
order of strength, gradually reducing, or "diluting" , the rigidity of the structure. 

An example of this rigidity dilution for a given protein is shown in Figure 
for the IHRC horse cytochrome C structure. The horizontal axis represents the 
protein's linear primary structure. Flexible areas of the polypeptide sequence are 
shown as horizontal thin black lines while areas lying within a rigid cluster are shown 
as thicker coloured blocks. Colour is used to differentiate which residues belong to 
which rigid cluster. The three-dimensional protein fold makes it possible for residues 
that are widely separated along the backbone to be spatially adjacent and form a 
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Table 1. List of all the proteins, their organism of origin, PDB codes as well 
the Figures in which they appear 



Protein 


Organism 


PDB ID 


Figure Comments 


Cytochrome C 


Horse 


IHRC 


[6] uncomplexed 






IWE J 


complexcd with antibody E8 






1U75 


complexed with peroxidase 






ICRC 


at low ionic strength 


Cytochrome C 


Tuna 


5CYT 


[6] fcrrieytochrome 






1154 


2FE;1ZN mixed-metal porphyrins 






1155 


2ZN:1FE mixed-metal porphyrins 






ILFM 


Gobalt(iii J-subsituted 


Cytochrome C 


Rice 




1 tti 




Bonito 


ICYC 






Bacteria 


1 A '7\F 






Tuna 


1155 






Yeast 


i 1 V_/l^ 








2YCC 




Myoglobin 


Horse 


IDWR 


UP 




Whale 


IHJT 






Turtle 


ILHS 




a-lactalbumin 


Baboon 


lALC 


Ef 




Human 


IHML 






Goat 


IHFY 






Human 


1A4V 






Guinea pig 


IHFX 






Gattle 


1F6R 




Hemoglobin 


Human 


1 A3N 


_ 

ITU deoxy 


(a chain) 




2DN1 


oxy 






2DN2 


deoxy 






2DN3 


carbonmonoxy 




Goose 


1A4F 






Rice 


1D8U 






Bacteria 


IDLW 






Alga 


IDLY 






Cattle 


1G09 






Worm 


1KR7 






Clam 


IMOH 




HIV-1 Protease 


Virus 


IHTG 


[7t homodimers with inhibitors bound 






4HVP 








7HVP 








8HVP 








9HVP 




Tiypsin 


Salmon 


lAOJ 


ET 




Cattle 


1AQ7 








lAUJ 






Pig 


lAVW 






Pig 


lAVX 






Cattle 


1AZ8 






Rat 


IBRA 








IBRB 








IBRC 






Cattle 


IBTH 






Salmon 


IBZX 






Human 


1H4W 








IHPT 






Cattle 


IKII 








IKIJ 








IKIM 








IKIN 








IKIO 








IKIP 






Pig 


ILDT 






Human 


ITRN 








2RA3 






Rat 


3TGI 
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Figure 1. Dependence of hydrogen bond energy E in First on tlie donor-acceptor 
distance. The shaded region indicates how an distance variation of ±0.1 A can lead 
to a variation in the bond energy of more than 1 kcal/mol. 



single rigid cluster. The vertical axis on the dilution plot represents the dilution of 
constraints by progressively lowering the cutoff energy for inclusion of hydrogen bonds 
in the constraint network. Each time the rigid cluster analysis of the mainchain a- 
carbon atoms (Cq) changes as a result of the dilution, a new line is drawn on the 
plot, labelled with the energy cutoff and with the network mean coordination for the 
protein at that stage. We should stress that the RCD is always performed over the 
entire protein structure (mainchain and sidechain atoms) and a dilution is performed 
for every hydrogen bond removed from the set of constraints, typically several hundred 
bonds for a small globular protein. The dilution plot is then a summary concentrating 
on the rigid-cluster membership of the Cq atoms defining the protein backbone. 

2.3. Mainchain rigidity loss during dilution 

Dilution plots of very similar protein structures can be compared directly as shown in 
FigurelH This form of comparison, however, becomes unwieldy when comparing large 
numbers of structures, and can obscure differences in the hydrogen-bond energy scale. 
For glassy networks [31] the overall degree of rigidity of the structure was measured by 
the number of atoms in the largest spanning rigid cluster in a network with periodic 
boundary conditions. Since the protein is not a periodic structure, we measure its 
overall rigidity by considering how many of its residues are included in large rigid 
clusters. 

In Figure [3fa) we show the number of Cq contained within the larger N rigid 
clusters of the horse cytochrome C structure IHRC, for which the total number of Cq 
atoms equals Afca ^ 105. It is clear that only the first few rigid clusters (numbered 
1-5) contain more than one Cq while higher-numbered clusters do not contain more 
than one Cq and do not represent two or more residues forming a single rigid unit. In 
Figure [3jb) we show the fraction /jv of Cq contained in the first N cluster, defined as 




(1) 
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Figure 2. (a) Dilution plot for horse cytochrome C from the IHRC structure. 
Flexible regions of the polypeptide chain appear as black thin lines, whereas rigid 
portions appear as coloured along the protein chain with Ca labelled from 1 to 
105. The second colum on the left indicates the mean number (r) of bonded 
neighbours per atom as the energy cutoff E changes. When E decreases (left- 
most column), rigid clusters break up and more of the chain becomes flexible. 
Colour coding shows which atoms belong to which rigid cluster. (b,c,d and e) 
Rigidity distribution for horse Cytochrome C from the IHRC structure in 3D. 
These figures represent in grey the fiexible regions and in colour the largest rigid 
regions for the native state at energy cutoffs (b) E = 0.000, (c) E = 1.007, (d) 
E = 2.073 and (e) E = 3.082, respectively. For each figure, the colour coding 
correlates with the colour coding given in (a). The arrows in (c) and (d) indicate 
two smaller rigid clusters shown in "stick" representation for clarity. The heme 



Comparative analysis of rigidity across protein families 7 




-2 -1 -5 -4 -3 -2 
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(a) (b) 

Figure 3. (a) The number njv of Ca atoms contained within rigid clusters (RC) 
N = 1, ... ,5 and 10 of the IHRC structure. Smaller, higher-numbered clusters 
do not contain more than one Ca- (b) The fraction / of the protein's Ca atoms 
contained within clusters 1 to iV. The line corresponding to the N = 5 data has 
been shaded to show that the inclusion of rigid clusters 1 through 5 captures the 
large-scale rigidity of the protein. 



for, e.g. those Cq. lying within rigid clusters iV = 1 to 5 and also 10. The inclusion of 
the first five rigid clusters captures the large-scale rigidity of the protein; the difference 
between N = 5 and = 10 is minimal. We therefore use the N — 5 measure, f5{E), 
to quantify protein rigidity hereafter, which we will refer to as mainchain rigidity. We 
emphasize that we have also computed all results presented here for TV = 4 and = 6 
with quantitatively similar and qualitatively identical results. 

It is worth noting the "stepped" appearance of our graphs. This is because a 
given pattern of rigidity persists as the cutoff is lowered until at a specific value it 
changes and a certain amount of rigidity is lost. 

2-4-. Structural comparison by RMSD 

When dealing with slightly varying crystal structures of the same protein, we quantify 
the structural variation by aligning the Cq, atoms of two structures and obtaining the 
root-mean-square deviation between Cq. positions. 



d = 



1 ^ 



4 (2) 



where da is the distance between the Cq atoms of residue i in the aligned structures. 



3. Results and discussion 



3.1. Comparing rigidity of very similar proteins: cytochrome C 

In Figure |4] we show dilution plots for four mitochondrial cytochrome C structures 
obtained from horse crystallised under different conditions as detailed in Table [T] 
The structural variations between these four structures are small (Table [5^), the 
largest being 0.572A between 1U75 and IWEJ; for comparison, Minary and Levitt [36] 
consider structures within d ~ 4A as "near-native" . 
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O.OOO 

0.000 2.490 

-0.013 2.475 

-0.021 2.474 

-0.092 2.458 

-0,125 2,453 

-0,758 2,430 

-0.859 2,429 

-1.007 2,428 

-1,054 2,427 

-1,165 2,425 

-1.269 2,424 

-1.308 2.424 

-2.073 2.418 

-2.118 2,417 

-2.214 2,415 

-2.503 2,412 

-2.868 2,409 

-3.082 2,407 

-3,123 2,406 

-3,245 2,404 

-4,048 2.394 

-4.259 2,393 



(a) IHRC: uncomplcxcd 



0.000 

0.008 2.482 

-0.017 2.474 

-0.078 2,452 

-0.154 2.444 

-0.475 2.430 

-0.540 2.429 

-0.640 2.429 

-0.844 2.428 

-1.208 2,424 

-1,256 2,422 

-1.406 2,419 

-2.175 2,416 

-2,508 2,415 

-2,555 2,414 

-2,869 2,411 

-3.069 2,407 

-3.591 2,401 

-3.626 2,400 

-3.731 2,399 

-3.906 2.398 

-4.102 2.396 



(b) IWEJ: antibody complex 



(c) 1U75: peroxidase complex 



0,000 

-0.020 2,478 

-0.254 2.458 

-0,344 2.454 

-0,363 2,453 

-0,464 2,449 

-0,481 2,447 

-0,573 2.447 

-0.853 2.444 

I 2,442 

1.061 2,441 

1,274 2,440 

-1,305 2.438 

-1,384 2.434 

1,942 2,429 

1.999 2,428 

! 2,427 



-3.411 



2,418 



(d) ICRC: low ionic strength 



Figure 4. Dilution plots for four crystal structure of horse cytochrome C. 
The four structures are very similar to each other (see text) and display similar 
patterns of rigidity loss. The central portion of the protein sequence breaks up 
into smaller clusters (e.g. close to E=-l for IHRC and E=-0.7 for IWEJ) and 
then becomes entirely flexible, while the rigidity of the two ends of the sequence, 
around residues 5-15 and 90-105, persists longer; these portions are Q-helical in 
secondary structure. 



From\To: IHRC ICRC IWEJ From\To: 5CYT 1155 1154 

, . ICRC 0.32 — — 1155 0,27 — — 

IWEJ 0,318 0,321 — ^ > 1154 0.2668 0,041 — 

1U75 0,472 0,53 0,572 ILFM 0,286 0,116 0,087 



Table 2. Root-mean-square deviation in A for Ca positions among (a) four horse 
cytochrome C structures and (b) four tuna cytochrome C structures, showing the 
similarity of the structures. 



The patterns of rigidity loss in Figure 3] appear quite similar on first inspection. 
Tlie central portion of the protein sequence breaks up into smaller clusters and then 
becomes entirely flexible, while the rigidity of the two ends of the sequence, around 
residues 5-15 and 90-100, persists longer; due to this persistence, these portions (a- 
helical in secondary structure) were identified in [22] as being the folding core of 
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-0.017 : 

-0.970 2.4: 
-1.02! 
-1.19! 

-1.446 2.4: 

-1.449 2.4; 

-1.619 2.4; 
-1.633 

-1.669 2.4; 

-2.867 : 

-3.298 2.4 

-3.337 ■ 

-3.604 2.4 

-3.672 2.4 



(a) 5CYT: normal fcrri cytochrome 



0.000 

-0.020 2.497 

-0J44 2.465 

-1,105 2.454 

-1,177 2.453 

-1.282 2.451 

-2.116 2.446 

-2.172 2.443 

-2.227 2.443 

-2,282 2,442 

-2,342 2.440 

-2.546 2.438 

-2.Sai 2.437 

-2.986 2.434 

-3.465 2.428 



(b) 1155: crystallised from 2Zn:lFe mix 



-0.015 2. 

-0.163 2. 

-0.259 2. 

-0.296 2. 

-0.938 2, 

-1.785 2, 

-1.958 2, 

-2.012 2. 

-2.052 2. 

-2.192 2. 

-2,283 2, 

-2,427 2, 

-2,988 2, 

-3.005 2. 

-3.133 2. 

-3.315 2. 

-3.337 2, 

-3.623 2, 

-3.778 2, 

-4.163 2. 

-4.247 2. 



(1.004 

(1.005 
-0.923 2. 
-0.975 2. 




(c) 1154: crystallised from lZn:2Fe mix 



(d) ILFM: with Co replacing Fe 



Figure 5. Rigidity dilutions for four forms of tuna cytochrome C crystallised 
with different metal ion content in the heme groups, (a) normal Fe, (b) from a 
mixture with 2Zn:lFe, (c) from a mixture with 2Fe:lZn, (d) with Co. 



cytochrome C, in agreement with experimental evidence. 

On closer inspection, however, we can see differences between the four structures 
in the cutoff energies in which changes in rigidity take place. For example, in structures 
IHRC and IWEJ, the terminal a-helical sequences remain rigid down to cutoff values 
below —3 kcal/mol, while in ICRC and 1U75 these sequences are already largely 
flexible at a cutoff value of —2 kcal/mol. We plot the mainchain rigidity of these four 
proteins as a function of cutoff energy during dilution in Figure [6fa). The differences 
in energy scale of the rigidity loss is now clearly visible. Note in particular that in the 
energy range around —0.1 to —0.6 kcal/mol, two of the structures retain mainchain 
rigidity (/s > 0.9) while the other two have already dropped to /s < 0.5. 

In Figure O we now consider mitochondrial cytochrome C structures (from tuna) 
which differ only in their heme-group metal content and are structurally very similar 
(RMSD values given in Table O^). We see that the dilution plots for the tuna protein 
have similar shapes and indeed are quite similar to those for the horse protein (Figure 
[4|). There are differences, however: in particular, in structure 1154 the o;- helical region 
at residues 60-70 remains rigid to lower cutoff values then that at residues 90-100, 
which would disagree with the "folding core" prediction of reference [22]. We would 
therefore argue that physical conclusions drawn from rigidity analysis should be based 
on the comparison of as many structures as possible if they are to be robust. 

Once we plot mainchain rigidity as a function of cutoff energy we again observe 
differences in the energy scales at which rigidity is lost, (Figure [6)3). The greatest 
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E (kcal/mol) E (kcal/mol) 



(a) Horse cythochromes (b) Tuna cythochromes 

Figure 6. (a) Mainchain rigidity as a function of hydrogen bond energy cutoff 
E during dilution for four horse mitochondrial cytochrome C structures. Note 
that for cutoff energy values in the region of —0.5 kcal/mol, structure IHRC and 
IWEJ are almost completely rigid while structures 1U75 and ICRC are less than 
50% rigid, (b) Mainchain rigidity for four tuna cytochrome C structures. Note 
the considerable differences in behaviour between, for example, 5CYT and 1155 
in the —1 to —2 kcal/mol energy range, even though the structures differ from 
each other only slightly. 



discrepancy appears in the energy range from —1 to —2 kcal/mol; here the 5CYT 
structure has ~ 0.4 while 1155 has f^ ~ 0.9, although the structures differ by less 
than d = 0.3A in RMSD. 

3.2. Variability of energy scales and selection of cutoff values 

It is clear from our investigation of cytochrome C structures that the rigidity analysis 
at a given cutoff value on very similar structures can easily produce different results. 
This is not because the dilution plots for these structures differ drastically in their 
shape, but rather because the cutoff energy at which a major change in rigidity takes 
place can differ by approximately 1 kcal/mol between very similar structures. This 
sensitivity of cutoff energy scales to small structural variations is understandable if 
we consider, for example, the distance dependence for the hydrogen-bond energy 
function [19]; we show in Figure [1] that a variation in donor- acceptor distance of 
only O.lA can shift the hydrogen bond energy by around 1 kcal/mol. Thus while the 
hydrogen bond energy function is successful in distinguishing weaker from stronger 
bonds, its resolution is limited to approximately 1 kcal/mol. 

This implies that exact values of the hydrogen bond cutoff energy cannot easily 
be transferred between different crystal structures. Rather, it is advisable to perform 
rigidity dilution on the specific protein structure(s) of interest and to observe how 
the rigidity changes as the weaker bonds are eliminated, and which portions of the 
structure are most stable, before selecting an appropriate cutoff value for further 
investigation of the rigidity /flexibility of the structure(s). While this is an a sense 
the implicit message of the wide variety of cutoff values used in previous studies (see 
section p^ppendix A[ ) we believe the point should be made explicitly for the benefit of 
potential users of the method. 
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1A7V 
1155 
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y^2YCC 




-4 -3 -2 

£ (kcal/mol) 

(a) Cytochrome C 



E (kcal/mol) 

(b) Myoglobin 




-3 -2 -1 

E (kcal/mol) 

(c) o-Lactalbumin 



E (kcal/mol) 
(d) Hemoglobin (A chain) 




(e) HIV-1 protease 
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E (kcal/mol) 

(f) Trypsin 







lAOJ 




iKii 












1AQ7 


A A 
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lAUJ 
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IKIM 












lAVW 


V V 


IKIN 














lAVX 


> t> 


IKIO 
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v-v 


1AZ8 
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ILDT 
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- 
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Figure 7. Rigidity dilutions for different families of proteins: cytochrome C, 
myoglobin, a-lactalbumin, hemoglobin, HIV-1 protease and trypsin. We can see 
that proteins can display either a "gradual" (a-d) or a "sudden" (e— f) pattern of 
rigidity loss. 



3. 3. Patterns of rigidity loss 

For the cytochromes that we have so far considered (|3.1[) . the general pattern is 
one of gradual rigidity loss, particularly for \E\ > 1. This indicates a hierarchy of 
stability in the rigid clusters, with some areas being rigidified by very weak hydrogen 
bonds, some by bonds of medium strength and some by the strongest bonds. This is 
reminiscent of the gradual or second-order rigidity transition observed in some glassy 
networks [31], specifically those with a wide diversity in their constraint distribution. 
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Glassy networks with less diverse constraint networks, however, show a sudden, first- 
order-like rigidity transition in which the structure passes between largely flexible and 
largely rigid states on the addition or removal of only a few constraints. 

In Figure[7]we show the patterns of rigidity loss for six different families of proteins 
as listed in Table [1] Our sample falls into two classes, those displaying a gradual 
pattern of rigidity loss (Figure [71 (a) cytochrome C, (b) myoglobin, (c) lactalbumin, 
and (d) hemoglobin) and those displaying a sudden loss of rigidity once weak bonds 
are eliminated (Figure [71 (e) HIV-1 protease and (f) trypsin). For proteins in this 
second class, all the 25 structures that we examine display rapid loss of mainchain 
rigidity as weak bonds are removed and the mainchain has become almost entirely 
flexible once the cutoff energy is reduced below —2 kcal/mol. This indicates that the 
rigidity of clusters in these proteins is due to weaker hydrogen bonds and we do not 
see (as we do in the other four proteins) the persistence of rigid clusters bound by 
stronger hydrogen bonds. 

The HIV-1 protease is a natural homodimer and we consider the rigidity of the 
dimer, as in [19]. For the other protein families our data is obtained from single protein 
chains; for example, for hemoglobin we analyse a-hemoglobin chains. It should be clear 
that a protein chain treated in isolation always has fewer constraints then when treated 
as part of a complex, and indeed we find that the individual chains from the HIV-l 
protease structure are even less rigid than the entire dimer (data in Supplementary 
Materials). For the case of hemoglobin, we can confirm that the rigidity of the 
isolated A-chain seen in Figure [TJd) differs only slightly from the rigidity of the same 
chain when analysed as part of the full tetrameric hemoglobin structure (data in 
Supplementary Materials). Consideration of isolated HIV-1 protease monomers, or of 
full hemoglobin tetrameric complexes, thus does not alter their classification in terms 
of gradual or sudden loss of rigidity. 

Comparison of these six protein families thus leads us to the conclusion that 
protein structures, like glassy networks, can display two distinct patterns of rigidity 
loss depending on the diversity of their constraint networks. We have identified two 
families of proteins, HIV protease and trypsin, whose members display rapid loss of 
rigidity as weaker hydrogen bonds are eliminated, in contrast to four other families of 
proteins which display a gradual loss of rigidity indicating a hierarchy of hydrogen- 
bond strengths in the constraints that maintain protein rigidity. 

4. Conclusion and outlook 

Our motivation in this study was twofold: to clarify a methodological issue in the 
use of rigidity analysis on protein structures, by determining the robustness of RCDs 
against small structural variations and the significance of the cutoff energy value, and 
to obtain an insight into the patterns of rigidity loss during hydrogen-bond dilution, 
by comparison with the observed patterns in glassy networks. 

On the first point, we find that there is considerable variation in the RCDs of 
structurally similar proteins during dilution. Figure [6l for example, shows that among 
a group of cytochrome C structures drawn from similar eukaryotic mitochondria, 
energy cutoffs in the range from to —2 kcal/mol (such as have typically been used for 
First/Froda simulations of flexible motion [24,25,27,28]) can produce a wide range 
of degrees of mainchain flexibility. We conclude that the results of rigidity analysis 
on individual crystal structures should not be over-interpreted as being "the" RCD 
for a protein. The hydrogen-bond energy function in First is quite sensitive to small 
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structural variations; while it successfully divides weaker from stronger bonds, it is 
not possible to identify a unique value of the hydrogen bond cutoff energy which can 
be applied to all protein structures to give meaningful results. Rather, each protein 
structure should first be subjected to rigidity dilution to produce a dilution plot; a 
suitable value of the cutoff energy can then be chosen to test a specific hypothesis 
about the rigidity and flexibility of the protein. Similarly, when physical significance 
is attached to the pattern of rigidity loss [22] , then multiple similar examples of a given 
protein structure should be studied in order to be robust against structural variation. 

On the second point, we find that proteins can display either gradual (second- 
order-like) or sudden (first-ordcr-like) patterns of rigidity loss during dilution. We find 
sudden rigidity loss in two proteases, eukaryotic trypsin and viral HIV-1 protease. 
Both consist largely of /3-sheet secondary structure with little a-helical content 
compared to the other proteins in our set, which may account for their different 
rigidity behaviour. Previous work [21] has emphasised the analogy between the rigidity 
transitions of proteins and of glassy networks; we have now found that the two distinct 
patterns of rigidity transition recently identified in glassy networks [31] are also seen 
in proteins. 

Our results in this paper suggest several avenues for further enquiry. The rigidity 
of protein monomers extracted from complexes should be systematically compared 
with their rigidity within the complex, which will be affected by interchain interactions. 
The robustness of flexible motion simulations based on rigidity analysis using different 
cutoff values must also be investigated. A recent study of the flexible motion of myosin 
[29] found that the flexible motion of the myosin structure appeared qualitatively 
similar over a wide range of cutoff values covering both highly flexible and more rigid 
structures. This suggests that rigidity analysis retains its value as a natural coarse- 
graining for simulations even if the rigidity behaviour during dilution is as variable as 
we have found. 
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Appendix A. CutofT values in previous studies using First 

Jacobs et al. [19] comment that the results of First analysis should not be sensitive 
to the typical "fluctuations known to occur within protein structures" . Their advice is 
that the cutoff should be at least —0.1 kcal/mol in order to eliminate a large number 
of very weak hydrogen bonds with energies in the range from 0.0 to —0.1 kcal/mol, 
and that a natural choice is near the "room temperature" energy of —0.6 kcal/mol. 
As we have seen in section 13.21 this criterion is not sufficient to avoid sensitivity to 
small structural variations. 

Rader et al. [21] consider the protein folding transition by monitoring (r) (mean 
number of bonded neighbours per atom) during rigidity dilution; they do not, however, 
comment on the hydrogen bond energy values. Hespenheide et al. [22] identify 
the protein folding core with "the set of secondary structure that remain rigid the 
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longest in the simulated denaturation" , without regard to the exact values of the 
cutoff energy at which rigidity is lost. Here the cutoff energy is used qualitatively to 
distinguish weaker from stronger bonds. In considering the rigidity of virus capsid 
protein complexes, Hespenheide et al. [20] make use of a cutoff of —0.35 kcal/mol, 
a value chosen so that capsid protein dimers would be flexible while the inner ring 
of proteins in a pentamer of dimers would be rigid, and draw conclusions about the 
rigidity of other multimeric complexes. Meanwhile, Hemberg et al. [26] use a different 
cutoff of —0.7 kcal/mol in a study on the dynamics of capsid assembly. 

The Froda geometric simulation algorithm [24] makes use of the RCD generated 
by First as a coarse-graining. Simulations of protein mobility using First/Fro da 
have tended to use cutoff values that are systematically lower than in applications of 
First alone; typically —1 kcal/mol or lower [24,25,27-29], as cutoff values closer to 
zero seem to include too many constraints to allow large-scale motion to occur. In a 
paper on the combination of rigidity analysis and elastic network modelling, Gohlke 
et al. [30] discuss RCDs of two protein crystal structures but do not specify a cutoff 
value, though the Froda mobility simulations given in Figure 3a of that paper were 
performed using a cutoff of —1.5 kcal/mol and give an excellent match to experimental 
data from NMR ensembles. 
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